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ABSTRACT 

Time  and  motion  studies  constitute  a  proven  approach  to  understanding  and  improving  any  engineering 
enterprise.  We  believe  software  processes  are  no  different  in  this  respect;  hou'ever,  the  fact  that 
software  development  yields  a  collaborative  intellectual,  as  opposed  to  physical,  output  Ccills  for 
careful  and  creative  measurement  techniques. 

In  attempting  to  answer  the  question  'where  does  time  go  in  software  development?'  we  have  been 
experimenting  with  two  relatively  uncommon  forms  of  data  collection  in  the  software  development 
field:  time  diaries  and  direct  observation.  This  paper  describes  the  latter  in  which  we  drew  upon 
experimental  techniques  from  the  behavioral  sciences  to  observe  engineers  developing  software  in  a 
large  organization. 

We  have  found  that  both  methods  of  research  are  feasible  and  yield  useful  information  about  time 
utilization.  The  major  source  of  discrepancy  is  granulcirity:  most  software  developers  are  not  capable  of 
retrospectively  reporting  the  large  number  of  unplanned  interruptions  and  transitory  events  that 
typically  characterize  their  working  day.  We  were  able  to  quantify  the  effect  of  such  social  processes 
using  the  observational  data. 


1.  Introduction 

In  1975,  Frederick  P.  Brooks,  Jr.  described  software  construction  as  inherently  a  systems  effort,  "an 
exercise  in  complex  (human)  inter-relationships"  [Brooks75].  Yet  now,  almost  twenty  years  later,  most 
experiments  still  focus  on  the  mechanical  aspects  of  programming,  not  the  social.'  This  focus  continues 
despite  the  fact  that  Brooks  and  other  leading  researchers  in  the  field  have  continued  to  emphasize  the 
importance  of  the  human  aspects  in  system  development  [BrooksSO;  Brooks87;  Curtis86;  Ershov72; 
Leves92;  Shneid86;  Weinb71],  and  despite  the  fact  that  interim  process  studies  have  indicated  a  large 
amount  of  unexplained  variance  in  human  performance  [Benj92;  Boehm88;  Brooks80;  Cusum91;  Sack68; 
Vos84]. 

The  few  studies  that  have  investigated  the  human  aspects  of  programming  have  relied  primarily  on 
student  programmers  or  artificial  tasks  in  laboratory  settings  [Curtis86].  Although  these  have  been 
informative  and  useful,  their  relevance  to  large  scale  software  development  is  being  increasingly 
questioned.  How  representative  are  such  samples  and  tasks?  What  kinds  of  problems,  unique  to 
organizational  environments,  are  being  ignored  by  focusing  on  these  small  and  artificial  domains?^ 

For  example,  work  exists  in  a  temporal  context.  How  time  is  partitioned,  scheduled  and  used  have  both 
dramatic  and  subtle  influences  on  organizations  and  the  people  in  them  [McG90;  Schr87;  Vinton92]. 
Within  the  domain  of  software  development,  much  recent  effort  and  attention  have  been  devoted  to  the 
need  for  increased  market  response  time.  Yet  there  has  been  surprisingly  little  research  on  time  related 
behavior  at  the  individual  level  and  on  the  connection  between  individual  actions  and  an  organization's 
ability  to  act.  The  empirical  evidence  on  the  reverse  relationship  is  similarly  unsatisfactory.  Does  time 
pressure  spur  people  to  improved  productivity  [Andr64;  Andr76]  or  create  an  overload  of  competing  tasks 
[Pelz63]? 


1.  It  is  an  interesting  historical  question  as  to  why  this  focus  has  occurred.  Some  researchers  attribute  it  to  the  EE  background  of  the 
disapiine  [Kraft79].  Others  view  it  has  an  example  of  choice  of  research  methods  determining  the  questions  to  be  studied 
[BrooksSO], 

2.  One  unfortunate  result  of  this  narrow  focus  is  that  the  few  studies  that  have  investigated  programmers  have  generated  an  enormous 
amount  of  myth  that  needs  to  be  challenged  [Weinb71].  We  will  have  some  comments  to  make  about  such  accepted  truisms  as:  (1) 
developers  don't  like  (and  cannot)  be  observed  and  (2)  software  programming  is  an  isolated  type  of  activity. 


The  literature  yields  a  history  of  provocative  (and  troubling)  anecdotes  on  time  utihzation  during 
software  development.  In  particular,  several  prominent  authors  have  written  that  a  significant  proportion  of 
project  effort  is  devoted  to  non-programming  activities,  with  some  estimates  indicating  as  much  as  50%  of 
a  work  week  typically  absorbed  by  machine  downtime,  meetings,  paperwork,  company  business,  sick  and 
personal  days,  leaving  little  time  for  actual  programming  and  debugging  efforts  [Boehm88;  Brooks75; 
Mayer68].  A  close  examination  of  the  references,  however,  seems  to  indicate  that  all  of  these  claims  are 
based  on  one  unpublished  1964  dissertation  by  E.F.  Bairdain  on  how  software  developers  spend  their  time 
[Mayer68].  Details  of  the  sample  and  methodology  used  to  generate  these  findings  are  not  readily 
available. 

The  present  study  is  therefore  part  of  an  on-going  effort  to  understand  what  professional  software 
developers  actually  do  as  opposed  to  what  they  say  they  do  or  are  thought  to  do  [Weinb71].  We  focus  on 
time  as  a  critical  yet  under-explored  aspect  of  life  in  software  organizations. 

In  an  earlier  study  (a  prototype  for  our  current  large-scale  experiment)  we  analyzed  data  on  one 
software  developer's  daily  activities  through  the  use  of  a  retrospective  diary  [Brad93a].  In  our  current 
experiment,  we  are  gathering  data  on  a  large  sample  of  developers  via  daily  retrospective  self-reports  to 
gather  the  data  [Brad93b].  This  paper  describes  a  direct  observation  methodology  we  used  to  assess 
informant  biases.  We  found  that  although  the  self-reported  data  is  well  calibrated  to  observations  when  used 
for  macroscopic  analyses,  it  often  fails  to  reflect  what  an  individual  is  doing  at  a  finer  resolution.  This  is 
largely  due  to  the  fact  that  software  developers  vary  in  the  degree  of  granularity  they  apply  to  their  self- 
reports  and,  in  general,  do  not  report  the  unanticipated  work  requests  and  unplanned  interruptions  they  both 
initiate  and  field  during  the  course  of  a  working  day.  Although  these  types  of  interactions  are  typically  of 
minor  duration,  they  collectively  impact  the  development  process  to  a  significant  degree. 

In  the  next  section,  we  briefly  review  our  prior  work  and  choice  of  methodologies  before  posing  specific 
research  questions.  In  Section  3,  we  describe  the  experimental  design  and  execution  of  the  observational 
study.  The  data  analysis  is  divided  into  two  pans:  in  Section  4,  we  first  calibrate  observations  to  self-reports 
and  develop  a  variance  factor;  in  Section  5,  we  investigate  the  principle  source  of  variance:  the  transitory 
interactions  not  reported  by  most  developers.  We  conclude  with  recommendations  of  how  to  use  the  data 


and  with  proposals  for  further  research. 

2.  Motivation 

2.1  Methodological  Approach 

Time  studies  have  a  rich  history  and  have  been  extremely  valuable  to  our  understanding  of  how  to 
organize  work,  and  how  to  build  complex  services  and  products  [And64;  McGr90;  Pars74].  The  traditional 
time  study  breaks  a  task  into  a  set  of  subtasks  at  the  individual  or  project  level  and  tracks  the  sequence  of 
activities  and  the  elapsed  time  interval  until  completion.  When  combined  with  cost  information,  a  model  of 
the  task  can  then  be  assembled  that  enables  an  organization  to  reduce  cost  and  production  time  by  more 
effectively  focusing  its  technology  and  resources.  Such  a  model  can  also  alert  managers  to  problems 
associated  with  the  process  such  as  scheduling  bottlenecks  or  instances  when  the  process  is  being 
circumvented. 

This  paper  builds  on  earlier  work  in  which  we  designed  a  modified  time  card  and  asked  software 
developers  to  record  their  daily  activities.  Our  preliminary  studies  are  described  by  Bradac  et  al.  [Brad93a; 
Brad93b],  and  a  sample  of  the  survey  instrument  is  shown  in  Appendix  A.  Although  periodic  interviews 
and  occasional  unannounced  visits  had  convinced  us  that  no  conscious  misrepresentation  occurred,  we 
sought  to  check  the  reports  of  time  usage  submitted  by  software  developers  via  direct  observation.''  As 
noted  above,  a  few  prior  studies  have  reported  aggregate  statistics  on  time  usage  in  software  development 
but  none,  to  our  knowledge,  have  been  based  on  actual  observation.  An  important  and  significant  precedent 
for  such  an  approach,  however,  is  work  done  in  the  early  1960's  in  Japanese  software  factories  [Cusum91]. 

Alternative  methods  do  exist,  and  we  considered  using  an  observer  with  a  video  camera  instead  of  an 
observer  taking  notes.  Although  there  are  precedents  for  using  video  cameras  [Guin90a;  Guin90b]  we  felt 
it  would  be  inappropriate  for  a  number  of  reasons.  First,  our  study  population  was  not  used  to  being 
observed.  The  subjects  were  receptive  to  the  notion  of  participating  in  an  experiment  and  quickly  became 


3.  For  example,  subjects  may  unintentionally  forget  significant  events  when  under  the  pressure  of  production.  Also,  we  are  implicitly 
relying  on  a  subject's  definition  of  what  was  most  significant  about  a  day  consisting  of  (often)  many  different  events.  Finally,  we 
were  concerned  about  the  consistency  of  the  survey  instrument,  both  across  and  within  subjects,  and  the  adequacy  of  the  data 
resolution.  H.  Russell  Bernard,  et  al.  gives  an  excellent  summary  of  much  of  the  informant  accuracy  research  [BemSO]. 


comfortable  with  the  observer,  but  the  introduction  of  video  equipment  would  have  distorted  their  behavior 
(not  to  mention  that  of  their  peers  and  overall  work  progress).  Second,  over  300  hours  of  video  tape  would 
have  to  be  watched  and  interpreted,  significantly  increasing  the  cost  and  duration  of  the  expenment.'* 
Finally,  we  were  interested  in  obtaining  information  about  why  developers  used  their  time  the  way  they 
did — why  they  made  certain  choices  and  how  they  decided  among  competing  demands  on  their  time.  Use 
of  video  would  have  precluded  oiu'  asking  questions  about  the  subject's  choices. 

This  approach  in  and  of  itself  cannot  (necessarily)  tell  us  what  developers  should  do  nor  what  they  are 
capable  of  accomplishing.  Yet,  as  noted  by  Wolf  et  al.,  "in  order  to  improve  processes  and  design  new  ones, 
it  is  necessary  to  obtain  concise,  accurate  and  meaningful  information  about  existing  processes"  [Wolf93]. 
That  is,  by  understanding  how  and  why  programmers  use  their  time  the  way  they  do,  we  will  be  better 
positioned  to  identify  tools  and  methods  that  enable  them  to  perform  tasks  in  less  time. 

2.2  Research  Questions 

The  specific  research  questions  we  sought  to  address  were  as  follows. 

1.  To  what  extent  are  the  daily  retrospective  seif-reports  an  accurate  representation  of  what 
actually  happened  during  the  developer's  day? 

—  what  are  the  sources  and  extent  of  bias? 

—  what  are  the  strengths  and  weaknesses  of  the  two  methodologies? 

2.  What  types  of  events  are  not  being  captured  in  the  self-reports  and  how  significant  are  they? 

3.  Observing  Software  Developers  in  Action 

The  first  part  of  this  section  describes  the  experimental  setting,  selection  of  study  subjects,  and  design. 
We  then  proceed  to  describe  the  actual  execution  of  the  experiment.  This  includes  both  the  initial 
preparation  of  the  study  subjects  and  research  procedures  as  well  as  the  data  collection  proper. 


4  Note  thai  we  have  touched  on  an  important  point  in  experimental  design — the  costs  versus  the  accuracy  of  the  experiment.  This  is  a 
common  tradeoff  problem  when  designing  a  study.  Although  outside  the  major  theme  of  this  paper,  it  is  important  to  realize  that 
our  cost  for  doing  this  experiment  was  =700  person  hours  at  S60  per  person  hour.  At  a  minimum,  reviewing  300  hours  of  tape 
would  have  increased  our  costs  by  =43%  (assuming  we  viewed  all  300  hours  of  tape). 
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3.1  Experimental  Setting 

The  subjects  for  our  time  studies  build  software  for  a  real  time  switching  system  [Cols92].  The  system 
is  a  successful  product  with  over  10  million  noncommentary  source  lines  (NCSL)  of  C  code,  divided  into 
41  different  subsystems.  New  hardware  and  software  functionality  are  added  to  the  system  approximately 
every  15  months. 

A  unit  of  functionality  that  a  customer  will  pay  for  is  called  a  feature,  the  fundamental  units  tracked  by 
project  management.  They  vary  in  size  from  a  few  NCSL  with  no  hardware  developed  to  50,000  NCSL 
with  many  complex  hardware  circuits  developed.  Most  software  is  built  using  a  real  time  operating  system. 

The  development  organization  responsible  for  product  development  consists  of  approximately  3,000 
software  developers  and  500  hardware  developers.  This  software  development  organization  is  currently 
registered  to  ISO  9001  standard  [ISO9001]  and  has  been  assessed  as  an  SEI  level  2  development 
organization  [Hump89]. 

3.2  Selection  of  Study  Subjects 

Five  software  developers  were  chosen  at  random  from  the  group  participating  in  the  self-reporting 
experiments.  No  one  who  was  asked  refused  to  participate.  The  remaining  10  developers  from  the  self- 
reporting  experiment  served  as  one  control  group,  allowing  us  to  assess  the  impact  of  observing  on  self- 
reporting.  We  also  had  nine  months  of  prior  diary  reports  on  these  two  groups  which  could  be  used  for 
comparison  with  post-observation  entries.  Two  software  developers  who  were  not  part  of  the  self-reporting 
experiments  were  also  included  in  the  observation  experiment  as  an  additional  form  of  control:  comparison 
with  the  other  subjects  will  enable  us  to  assess  the  impact  of  self-reporting,  given  observation. 

We  applied  a  purposive  sampling  scheme,  selecting  subjects  at  random  yet  stratifying  along  those 
dimensions  we  felt  would  be  most  significant.  The  two  major  factors  we  felt  were  most  important  to  control 
for  were  (1)  the  project  factors  of  organization,  project  phase  and  project  type  and  (2)  the  personal  factors 
of  age,  gender,  race,  individual  personality  and  years  of  experience. 


Our  goal  was  not  to  conduct  a  comparative  study  but  rather  to  obtain  a  broad  base  of  observations  and  to 
decrease  the  likelihood  of  idiosyncratic  findings.^ 

In  summary,  our  sample  consisted  of  one  treatment  and  two  control  groups: 

1.  Treatment  Group  1  contained  subjects  who  had  participated  in  the  self-reporting  experiments  for  9 
months.  They  continued  to  complete  their  time  diaries  and  were  simultaneously  observed. 

2.  Control  Group  1  contained  subjects  who  had  participated  in  the  self-reporting  experiments  for  9 
months.  They  continued  to  fill  out  their  diaries  but  were  not  observed. 

3.  Control  Group  2  contained  subjects  who  had  not  participated  in  the  self-reporting  experiments  but 
were  observed. 

33  Experimental  Design 

We  applied  a  social  experimental  design  that  allowed  us  to  efficiently  control  many  factors  with  as  few 
observations  as  possible  and  to  determine  whether  the  presence  of  an  observer  significantly  changed  the 
developer's  behavior.  It  combined  elements  from  three  standard  behavioral  science  experimental  designs: 
(a)  a  partial  factorial  design,  (b)  a  repeated  measure  design,  and  (c)  a  replicated  interrupted  time  series 
design  [Judd91].  Figure  1  depicts  the  2x3  partial  factorial  design.  The  two  independent  variables  are 
observation  predictability  and  observation  type.  The  design  is  partial  because  we  collect  data  in  only  four 
of  the  six  possible  cells. 

The  intent  behind  the  first  variable  (observation  predictability)  was  twofold.  First,  we  hoped  that  by 
observing  at  random  we  would  hinder  the  study  subjects  ft-om  adjusting  their  schedule  so  as  to  favorably 
impress  the  researcher.  Second,  we  sought  to  give  the  study  subjects  some  sense  of  control  over  the 
experience.  Prior  research  in  psychology  has  confirmed  the  notion  that  having  control  (or  the  perception 
thereof)  over  one's  environment  produces  physical  and  psychological  well-being  [Langer89]. 


5.  We  are  aware  that  our  sample  size  is  quite  small  and  probably  inadequate  for  statistical  validity  but,  following  the  logic  of 
(Brooks88],  we  believe  "any  data  is  bener  than  none."  This  work  falls  within  the  second  category  of  nested  results  Brooks  cites  as 
necessary  and  desirable  for  progress  in  software  development:  reports  of  facts  of  real  user  behavior  even  though  observed  in 
under-controlled,  limited  sample  expenences.  What  distinguishes  this  from  just  "any  data"  is  the  methodological  ngor  we  sought  to 
apply  to  the  study  and  the  relevance  of  the  results  to  real  world  software  development. 


Variable  Y — (Observation  Type) 

Full  Day  One  Hour  One  Hour 

Observing      With  Questions      Without  Questions 

(Y  =  l)  (Y  =  2)  (Y  =  3) 

Variable  X 

(Observation 

Predictability) 

Scheduled  XlYl  none  none 

(X  =  l) 

Unscheduled  X2Y1  X2Y2  X2Y3 

(X  =  2) 


Figure  1  2x3  Partial  Design  for  Observing  Software  Developers 

The  figure  displays  the  independent  variables  of  the  original  experiment  definition.  This 
was  later  simplified  by  using  only  full  day  observations,  for  reasons  described  in  the  text. 
X  represents  the  independent  variable  of  observation  predictability:  scheduled  and 
unscheduled  by  the  subject.  Y  represents  the  observation  types:  a  full  day,  snapshot  with 
questions,  and  snapshot  only.  The  original  design  is  partial  because  no  scheduled 
observations  involving  snapshots,  either  with  or  without  questions,  are  performed. 


The  use  of  multiple  controls  was  deliberate  and  deserves  further  explanation.  We  were  particularly 
cognizant  of  the  so-called  "Hawthorne  Effect,"  the  notion  that  the  mere  fact  of  having  subjects  self-report 
and/or  be  observed  might  alter  their  behavior  and  distort  our  conclusions  [Pars74].  We  therefore  built  in 
several  alternative  control  mechanisms  in  order  to  assess  the  significance  of  these  possible  distortions.  This 
included  both  standard  hold-out  samples  (Control  groups  1  and  2)  as  well  as  features  of  the  experimental 
design.  In  particular,  by  observing  our  subjects  more  than  once  under  each  condition  (repeated 
measurements),  each  subject  served  as  his/her  own  control.  The  replicated  interrupted  time  series  aspect  of 
the  design  further  allowed  us  to  compare  the  same  subject  over  time  and  different  subjects  at  the  same  time. 
This  enabled  us  to  assess  potential  threats  to  validity  posed  by  maturation,  history  and  testing.  Finally,  as 
noted  earlier,  we  had  extensive  pre-observation  self-reported  data  to  compare  with  that  reported  once  the 
observation  experiment  began. 

The  original  design  was  quickly  modified  to  only  a  full  day  observation  type  (collapsed  into  a  2x1 
design),  for  three  reasons:  (1)  the  subjects  were  in  three  different  locations,  creating  excessive  travel  costs 
for  the  observer;  (2)  the  arrival  of  an  observer  in  the  middle  of  the  day  created  too  many  awkward  warm  up 
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periods  and  thus  unduly  interfered  with  on-going  work;  and  (3)  it  was  often  necessary  to  observe  a  whole 
day  to  properly  interpret  a  single  event  and  the  subject's  response  to  it. 

Each  subject  was  observed  a  total  of  five  full  days  (9-10  hours  per  day,  on  average,  for  315  to  350  total 
hours  of  observation  over  a  12  week  period).  Two  of  the  five  days  were  chosen  and  scheduled  by  the 
subject.*  The  remaining  three  days  were  assigned  by  random  draw  without  replacement,  and  the  subjects 
were  not  informed  of  when  they  would  occur.' 

3.4  Experimental  Definition 

We  went  through  several  important  steps  to  prepare  for  observing  the  software  developers.  We  refer  to 
these  as  the  experimental  definition  since  they  involved  assembling  information  and  defining  procedures  for 
our  observations. 

3.4.  J  Preparation  of  and  Guidelines  for  Observation  Participants 

Because  we  were  aware  that  some  people  might  be  uncomfortable  about  being  observed,  we  spent 
considerable  time  beforehand  explaining  the  purpose  of  the  study  to  the  subjects.  We  positioned  the 
observer  as  a  student — there  to  learn  how  the  subject  spends  his/her  time  when  doing  software 
development.  The  subjects  were  reminded  that  there  are  no  right  or  wrong  ways  to  work  (i.e.,  our  purpose 
was  not  to  judge  but  to  understand  behavior  within  a  given  environment). 

We  were  particularly  sensitive  to  issues  of  anonymity  and  confidentiality  [Judd91].  All  data  was  entered 
under  an  ID  code  known  only  to  the  researchers.  Each  subject  was  also  given  a  list  of  his/her  rights:  the 
right  to  halt  or  discontinue  observations  at  any  time  or  withdraw  from  the  study  altogether;  the  right  to 
examine  the  observer's  notes;  and  the  right  to  ask  the  researcher  not  to  record  something.  None  of  these 
situations  occurred.* 


6.  Interestingly,  subjects  often  forgot  when  they  had  scheduled  such  sessions  and  were  subsequently  surprised  to  see  the  researcher  in 
the  morning.  This  reassures  us  that  subjects  were  not  too  intimidated  by  the  prospect  of  being  observed 

7.  The  logistics  behind  this  were  not  trivial.  For  example,  vacations  had  to  be  blocked  out  in  advance,  and  the  observer  had  to  adjust 
her  schedule  to  accommodate  subjects  who  worked  flexible  hours.  Many  lab  sessions  were  also  conducted  off -hours,  and  a 
procedure  was  established  for  what  to  do  in  the  event  that  a  developer  did  not  come  into  woA. 

8.  In  faa,  the  only  person  who  asked  to  look  at  the  notes  was  a  manager  who  was  extremely  worried  about  the  slate  of  his  (behind 
schedule)  feature. 


3.4.2  Data  Checklist 

To  insure  that  information  about  the  software  developers  was  uniform,  we  created  a  checklist  for  data 
collection.  It  consisted  of  demographic  information  (age,  gender,  race,  family  obligations  outside  of  work); 
educational  and  professional  experience;  current  organization  and  project  status;  work  habits;  job  security 
level;  and  overall  job  satisfaction.  This  data  was  collected  in  a  one-hour  interview. 

We  also  administered  three  survey  instruments:  a  Myers  Briggs  personality  assessment  [Keir84]; 
Kaufman,  Lane  and  Lindquist's  Polychronic  Attitude  Index  (PAI);  and  Bluedom's 
Monochronic/Polychronic  Orientation  Scale  [Blue92].  The  PAI  attempts  to  capture  an  individual's  general 
attitude  toward  performing  more  than  one  activity  at  a  time.  The  Bluedom  scale  measures  the  extent  to 
which  a  department  or  organization  is  polychronic  and  was  given  to  each  subject,  his/her  immediate 
manager,  and  a  peer. 

Finally,  we  copied  each  subject's  desk  calendar  for  two  months  and  noted  all  scheduled  meetings, 
classes  and  vacation  days. 

3.4.3  Rules  for  the  Researcher 

The  observer  endeavored  to  function  as  a  "human  camera,"  recording  everything  with  as  few  initial 
preconceptions  about  relative  importance  as  possible.  She  was  also  required  to  frequently  read  a  half  page 
list  of  reminders  designed  to  keep  the  observation  procedure  consistent.  Because  a  single  observer  was 
used  for  all  seven  subjects,  issues  of  inter-observer  variability  were  avoided  [Judd9I]. 

We  used  continuous  real  time  recording  for  non-verbal  behavior  and  interpersonal  interactions.  During 
those  interims  when  a  developer  was  working  at  the  terminal,  we  used  a  time  sampled  approach:  asking  the 
developer  at  regular  intervals  "what  are  you  doing  now?"  Daily  observations  were  recorded  in  small  spiral 
notebooks,  unique  to  each  subject.  Each  evening,  the  raw  notebook  observations  were  converted  to 
standard  computer  files.  This  allowed  us  to  readily  fill  in  observations  while  still  fresh  and  served  as  the 
basis  for  two  kinds  of  summary  sheets  described  below.  As  data  came  in,  it  was  added  to  a  loose-leaf 
notebook,  with  separate  sections  for  each  subject.  This  helped  us  stay  organized  over  time  and  facilitated 
communication  among  the  researchers. 
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Such  an  inductive  method  of  analysis,  which  involved  the  joint  collection,  coding  and  analysis  of  data, 
has  been  cited  as  one  of  the  best  ways  to  discover  theory  from  data  [Glaser67]. 

3.5  Miscellaneous  Remarks  About  Observing 

Almost  without  exception,  the  first  response  to  this  study  was  "You  just  want  to  observe  me?  There's 
nothing  to  see"  (from  subjects)  or  "How  boring"  (from  fellow  researchers).  In  fact,  nothing  could  be  further 
from  the  truth.  As  described  below,  a  software  developer's  day  was  filled  with  events  and  activities  that 
varied  considerably  across  individuals  and  time.  The  observer  accompanied  the  subjects  everywhere  (e.g., 
to  group  and  department  meetings,  affirmative  action  presentations,  tutorials,  lab  sessions)  and  witnessed  a 
broad  spectrum  of  events  (e.g.,  promotions,  mental  and  physical  exhaustion,  celebrations).  In  addition  to 
observing,  there  were  plenty  of  opportunities  to  talk  with  the  developers  and  their  colleagues  about  life  in 
the  organization,  and  we  found  them  to  be  remarkably  willing  to  do  so. 

4.  Calibration  Analysis 

In  any  experiment,  the  researcher  needs  to  understand  the  sources  of  variance  in  the  data.  This  allows 
the  researcher  to  build  a  probability  model  of  the  experiment  and  answer  the  question,  "What  is  the 
probability  of  being  wrong?"  Furthermore,  it  enables  other  researchers  to  reproduce  the  experiments  and  to 
interpret  agreements  or  disagreements.  To  make  the  software  development  process  more  efficient,  we  need 
to  gather  data  on  how  long  it  takes  developers  to  perform  given  tasks,  and  we  need  to  know  how  reliable 
this  data  is.  The  sources  and  properties  of  variance  are  also  important  in  their  own  right  [BrooksSO]. 

4.1  Description  of  Data 

In  order  to  calibrate  software  developers'  assessment  of  their  own  usage  of  time  we  compared  two  sets 
of  data — the  self-reported  data  and  the  observer's  notes.  Figure  2  displays  one  comparison  sheet,  with  a 
software  developer's  report  on  the  left  (see  example  in  Appendix  B)  and  our  summarized  observations  on 
the  right.  It  is  important  to  recognize  that  the  observer  recorded  data  contains  an  impressive  amount  of 
micro-level  detail,  often  down  to  three  minute  intervals.'  In  order  to  form  an  effective  comparison,  we 


9.    Wolf  and  Rosenblum  note  that  an  appropriate  level  of  granularity  is  very  imponant  and  have  designed  and  used  an  alternative 
direct  observaaon  method  that  is  opQmized  for  captunng  and  correlating  short  duranon  events  with  the  actions  of  people  fWolf93]. 
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therefore  summarized  that  detail  into  major  blocks  of  activities. '°  Note  that  in  forming  these  interim 
summaries,  we  have  ignored  many  transitory,  short  duration  events  which  the  developers  had  to  initiate  and 
respond  to  during  the  course  of  a  working  day.  Those  types  of  interactions  are  analyzed  separately  in 
Section  5. 


DIARY 

OBSERVER 

0800  -  1 800       Working  High  Level  Design 

0800  -  0900 

Administration 

0900-1010 

High  level  design  analysis 

1010-  1021 

Break 

1021-1135 

Code  experiment  with  peer 

1135-1226 

High  level  design  document  writing 

1226-1314 

Lunch  in  cafeteria 

1314-1330 

Answer  document  question  (responsible  person  out) 

1330-  1349 

Answer  growth  question 

1349  -  1406 

Reading  results  of  Business  Unit  Survey 

1406  -  1500 

Code  experiment  with  peer 

1500  -  1626 

Searching  for  paper  and  reading 

1626  -  1701 

Code  experiment  with  peer 

1701  -  1705 

Administration 

Figure  2  Comparison  Sheet  Example 

Report  form  comparing  a  software  developer's  self-reported  time  diary  with  the 
observer's  sununarized  notes.  This  sheet  is  typical  of  the  calibration.  Note  the  difference 
in  end  time  between  the  diary  and  the  observer's  notes;  =55  minutes.  The  diary  contains 
one  entry  for  this  9-10  hour  day  "Working  High  Level  Design."  The  observer  had  13 
entries  of  which  about  5  hours  corresponded  to  activities  associated  with  high  level 
design. 


4.2  Calibrating  Observations  with  Self-Reports 

We  address  two  questions  in  the  calibration  analysis:  how  accurate  are  the  self-reports  in  terms  of  (1) 
time  worked  and  (2)  what  actually  happened  during  that  time?  We  can  then  proceed  to  build  a  correction 
factor  for  the  individual  self-reports. 

Our  error  model  for  the  self-reports  has  two  components.  The  first  component  we  call  normalization. 
Figure  3  plots  the  total  working  time  (in  minutes)  each  software  developer  reported  versus  that  observed. 
The  45  degree  line  represents  perfect  agreement  between  observer  and  subject;  those  points  falling  to  the 
left  of  the  45  degree  line  represent  days  in  which  a  subject  under-represented  total  working  time  while  those 


10.  We  venfied  the  reliability  of  the  summary  process  by  randomly  companng  reports  prepared  by  independent  researchers.  The  level 
of  comparability  was  well  within  accepted  research  standards. 
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to  the  right  are  instances  where  a  developer  actually  worked  less  than  he/she  reported.  As  seen  in  figure  3, 
the  majority  of  the  observations  are  clustered  around  500  to  550  minutes  or  8.3  to  9.2  working  hours."  An 
obvious  exception  is  subject  2A  who  twice  worked  more  than  1 1  hours.  He  is  an  acknowledged  local  expert 
often  called  upon  to  help  solve  critical  issues  in  the  lab.  Note  that  most  subjects  appear  to  be  relatively 
consistent  in  the  accuracy  of  their  reporting.  Subject  2B,  for  example,  tends  to  over-represent  total  working 
time  by  about  one  hour  whereas  subjects  IB  and  IC  are  remarkably  accurate. 
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Figure  3  Comparison  of  Reports  of  Total  Working  Time 

This  plot  compares  each  software  developer's  self-reported  total  time  at  work  (in 
minutes)  with  that  actually  observed.  Each  developer  is  identified  by  a  unique  ID  (e.g., 
2A),  with  5  data  points  (observation  days)  per  individual.  The  45  degree  line  indicates 
perfect  agreement  between  observer  and  subject.  A  subject  is  said  to  under  (over)  report 
when  their  reported  time  is  less  (greater)  than  that  observed  (above  (below)  the  line).  On 
two  occasions  subject  2A  under  reported  his  work  time.  The  average  amount  of  over- 
represented  working  time  was  2.8%. 


1 1  These  calculations  merely  compare  begin/end  times;  we  have  not  subtracted  out  breaks  or  lunches  from  the  totals.  In  general,  there 
was  a  strong  correlation  m  the  reliability  of  all  such  measures;  i.e..  a  developer  who  accurately  reported  total  time  and  activity  also 
tended  to  accurately  record  the  number  and  duration  of  his  breaks. 
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The  average  amount  of  over-represented  working  time  —  the  positive  difference  between  the  subject's 
reported  work  time  in  a  given  day  and  that  recorded  by  the  observer  (as  a  function  of  what  the  observer 
saw)  —  was  2.8%. 

The  second  component  of  the  error  model  is  the  fidelity  of  the  self-reports.  If  we  were  to  try  and 
reconcile  one  of  the  subject's  self-reported  list  of  activities  on  a  given  day  with  that  actually  observed,  to 
what  extent  would  the  the  two  viewpoints  agree?  We  obtained  such  a  measure  by  dividing  the  number  of 
minutes  that  a  subject  and  the  observer  agreed  about  what  the  developer  was  actually  doing  by  the  total 
number  of  minutes  worked  that  day  (as  recorded  by  the  observer). 

Figure  4  displays  that  fraction  plotted  as  a  function  of  the  date  the  subject  was  observed.  The  vertical 
line  segments  represent  the  confidence  bounds  around  each  ratio. '^  The  agreement  between  the  observer 
and  subjects  ranged  from  0.95  to  0.58  for  subjects  IB  and  2B  respectively.  The  clusters  further  indicate  that 
the  variation  between  subjects  is  greater  than  the  variation  within  any  subject's  set  of  observations. 

We  found  that  the  more  diary  entries  the  subject  made  per  day,  the  greater  the  agreement  between  the 
observer  and  subject.  Figure  5  compares  the  number  of  entries  made  by  a  given  subject  with  the  number  of 
daily  activity  blocks  extracted  from  the  observation  notes  for  each  day  of  observation. 

Of  course,  some  days  are  more  eventful  than  others,  but,  as  indicated  in  Figure  5,  a  developer's  day  can, 
on  average,  be  segmented  into  about  12  different  activity  blocks  (from  the  observer's  perspective).  The 
developers,  however,  varied  in  the  level  of  detail  they  applied  to  their  diaries.  Subject  2B,  for  example, 
always  noted  one  major  activity  whereas  IB  and  IC's  level  of  detail  approximately  equaled  that  of  the 
observer.  Interestingly,  the  time  questionnaires  also  indicated  that  IB  and  IC  were  the  most  monochronic 
of  the  study  subjects. 

When  we  examined  the  content  of  what  developers  were  or  were  not  reporting,  the  major  source  of 
variation  was  the  large  number  of  unexpected  events  that  occurred  during  the  course  of  a  developer's  day. 


12  Two  independent  researchers  compared  each  of  the  activity  blocks  noted  by  the  observer  with  the  subject's  diary  entries  that  day 
and  assigned  each  such  block  to  one  of  three  categories:  agree,  disagree,  or  maybe.  The  length  of  the  line  segment  is  the  amount  of 
time  in  the  day  where  we  could  not  definitively  decide  whether  there  was  agreement  or  disagreement  between  the  two  reports. 
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Figure  4  Rate  of  Agreement  Between  Observer  and  Subjects 

The  fidelity  rate,  calculated  as  the  number  of  minutes  that  the  subject  and  observer  agree 
about  what  the  subject  was  actually  doing  divided  by  the  total  number  of  minutes  worked 
that  day,  is  plotted  by  the  date  of  observation.  The  vertical  lines  represent  the  confidence 
bounds  around  each  estimate.  The  fidelity  varies  between  0.95  and  0.58,  and  the  between 
subject  variation  is  clearly  greater  than  the  within  subject  variation. 


(See,  for  example,  the  two  afternoon  entries  of  the  right  hand  side  of  Figure  2).  Although  most  developers 
did  not  record  such  interruptions  in  their  retrospective  reports,  they  were  a  ubiquitous  part  of  the 
development  process  from  the  observers'  perspective.  In  the  next  section,  we  use  the  observation  data  to 
quantify  this  qualitative  impression  that  developers  are  frequently  interrupted. 

5.  Communication  Analysis 


Gerald  Weinberg  once  posed  the  provocative  question,  "Does  it  matter  how  many  people  a  software 
developer  runs  into  during  the  day?"  [Weinb71]  He  argued  that  although  the  task  of  writing  code  is  usually 
assigned  to  an  individual,  the  end  product  will  inevitably  reflect  the  input  of  others. '^  Others  have  noted 


13.  Note  the  distinction  between  what  developers  say  they  prefer  and  how  they  actually  behave.  According  to  Weinberg,  most 
programmers  would  probably  claim  to  prefer  to  work  alone  in  an  undisturbed  envirotunent,  yet  he  estimated  that  they  aaually 
spend  about  2/3  of  their  time  working  with  others.  Similarly,  we  observed  a  tremendous  amount  of  voluntary  collaboration  among 
the  developers  in  our  study  despite  frequent  protests  of  "too  many  mterrupuons."  Although  we  cannot  distinguish  whether  the 
behavior  is  driven  by  preferences  or  the  requirements  of  a  complex  task,  it  does  highlight  the  false  portrayal  of  programnung  as  an 
isolated  type  of  activity. 
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Figure  5  Comparison  of  Number  of  Entries 

This  plot  compares  the  number  of  major  activity  blocks  observed  versus  the  number  of 
diary  entries  recorded  by  each  developer.  We  found  that  the  closer  the  ratio  of  the  two 
numbers  was  to  1  (represented  by  the  dashed  line),  the  higher  the  fidelity  of  the  self- 
reporting. 


that  programs  have  a  dual  nature:  they  can  be  executed  for  effect  and  they  can  be  read  as  communicative 
entities  [Solow84].  Both  points  reflect  the  fact,  acknowledged  by  theorists,  that  information  flow  is  a 
critical  factor  in  organizational  success  [Allen77]. 

Most  communication  studies,  however,  address  a  very  narrow  range  of  interactions  that  occur  in  the 
course  of  a  collaborative  work  effort.  For  example,  they  are  typically  restricted  to  only  one  media  channel 
or  focus  on  exchanges  that  are  planned-in-advance  and  of  relatively  long  duration  [Kraut90].  Moreover,  the 
empirical  data  often  consists  of  asking  subjects  who  they  talk  to  the  most  and  thus  risks  confounding 
frequency  with  duration  or  impact  [Baum90;  BemSO].  The  present  study  offers  a  unique  opportunity  to 
address  such  deficiences  in  that  it  tracks  all  communication  activity,  at  the  individual  level,  across  multiple 
media  channels. 
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5.1  Description  of  Data 


Figure  6  presents  a  sample  of  the  communication  summary  sheet  we  prepared  on  each  subject,  based  on 
the  daily  observations  of  their  interactions  across  four  major  channels  (audix,  electronic  mail,  phone  and 
in-person  visits).'*  Drawing  on  the  methodology  of  [Wolf93]  we  have  broken  each  interaction  down  in 
terms  of  whether  it  was  sent  or  received  by  the  study  subject. 


SUBJECT  ID 

Audix 

Email 

Phone 

Visit 

Uniq 

Sent 

Recv'd 

Sent 

Recv'd 

Sent 

Recv'd 

Sent 

Recv'd 

Dayl 

Day2 

Day3 

Day4 

Day5 

Figure  6  Communication  Summary  Sheet  Example 

This  interim  sheet  summarizes  the  communication  messages  a  software  developer  sent 
and  received  across  four  media  channels  during  five  days  of  observation.  The  table  entries 
contain  the  total  number  of  unique  daily  contacts  ("Uniq")  and  the  duration  and  time  of 
day  of  a  particular  exchange. 


The  form  and  content  of  the  interactions  recorded  here  are  readily  recognizable  by  anyone  who  has 
worked  in  a  large  corporate  setting.  Perhaps  best  described  as  "on-the-fly"  exchanges  [Kraut90],  they 
usually  involved  little  formal  preparation  and  little  reliance  on  pre-written  documentation,  diagrams  or 
notes. '^  For  example,  a  developer  often  received  a  call  from  the  lab  about  a  testing  problem  that  needed 
immediate  attention  or  had  to  respond  to  requests  for  authorization  to  change  code  that  he  was  responsible 
for.  Several  of  our  developers  had  worked  in  other  departments  and  therefore  had  to  field  questions  from 
their  former  colleagues  (this  declines  over  time,  but  one  of  our  subjects  who  had  transferred  departments 
approximately  2  months  earlier  received,  on  average,  one  call  a  day  from  his  former  group).  Finally,  there 
existed  a  large  amount  of  unplanned  interaction  with  colleagues:  requests  to  informally  review  code, 
questions  about  a  particular  tool,  or  generjil  problem  solving  and  debriefing  sessions. 


14.  Paper  documentation  is  practically  non-existent  in  this  organization.  This  is  partly  due  to  the  ftnn's  interpretation  of  ISO 
requirements:  if  all  documentation  is  on-line,  the  possibibty  of  people  using  out-dated  versions  is  decreased. 

15  Note  that  we  did  not  include  contacts  made  in  (scheduled)  meetings  or  in  the  laboratory;  nor  did  we  include  purely  social 
exchanges  (e.g.,  a  call  to  a  wife/husband,  lunch  parmers).  The  unique  count  does  not  include  faceless  administrators. 
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Figure  7  Number  of  Unique  Contacts  Per  Subject  Per  Day 

The  number  of  unique  person  contacts  for  each  subject  per  day.  This  count  reflects 
interactions  across  four  media  channels  (audix,  email,  phone  and  in-person  visits)  but 
does  not  include  contacts  made  during  meetings  or  lab  testing.  Nor  does  it  include  social 
exchanges.  The  median  over  all  subjects  is  7  (last  boxplot).  The  outliers  primarily  reflect 
days  in  which  a  subject  was  working  on  modification  to  existing  code. 


5.2  Observations 


How  many  people  do  these  programmers  interact  with  during  a  typical  working  day?  Figure  7  presents 
a  boxplot  diagram  depicting  the  number  of  unique  daily  contacts  over  5  days  of  observation  for  each  of  our 
study  subjects.  A  boxplot  serves  as  an  excellent  and  efficient  means  to  convey  certain  prominent  features  of 
a  distribution.  Each  set  of  data  is  represented  by  a  box,  the  height  of  which  corresponds  to  the  spread  of  the 
bulk  of  the  data  (the  central  50%),  with  the  upper  and  lower  ends  of  the  box  being  the  upper  and  lower 
quartiles.  The  data  median  is  denoted  by  a  bold  point  within  the  box.  The  lengths  of  the  vertical  dashed 
lines  relative  to  the  box  indicate  how  stretched  the  tails  of  the  distribution  are;  they  extend  to  the  standard 
range  of  the  data,  defined  as  1.5  times  the  inter-quartile  range.  The  detached  points  are  "outliers"  lying 
beyond  this  range  [Chamb83].  As  depicted  by  the  far  right  boxplot,  the  median  numt>er  of  unique  contacts, 
across  all  study  subjects,  was  seven  per  day.  The  median  number  of  interections  per  day  is  already  straining 
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what  is  widely  believed  to  be  people's  cognitive  limitation  [Miller56]. 

Subject  2D  stands  out  in  contrast  to  the  rest  of  the  sample  with  a  median  number  of  1 1  daily  contacts. 
We  attribute  this  primarily  to  office  layout  (he  shared  space  with  two  other  people  and  their  local  coffee 
machine  drew  a  lot  of  traffic).'*  This  subject's  Meyers-Briggs  personality  profile  also  indicated  a  strong 
preference  for  social  interaction. 

The  outliers  in  the  boxplot  diagram  are  particularly  interesting.  The  highest  point  (17  unique  contacts) 
represents  a  day  in  which  develof)er  2C  started  to  work  on  a  code  modification  motivated  by  a  customer 
field  request.  The  other  outliers  also  correspond  to  modifications  of  existing  code,  and  in  each  case,  the 
number  of  unique  interfaces  approximately  doubled  from  the  baseline  of  7.  The  majority  of  these  contacts 
were  requests  for  authorization  to  change  code  owned  by  another  developer.  Just  slightly  less  frequent  were 
calls  to  a  help  desk  for  passwords  or  information  about  a  particular  release  of  the  software;  calls  to  the  lab 
requesting  available  time  slots  for  testing;  and  exchanges  with  peers  about  process  procedures  in  general. 
Note  that  these  contacts  were  not  technically  related  per  se.  That  is,  the  solution  was  usually  not  the 
motivating  issue  driving  this  behavior.  Rather,  the  developers  needed  help  implementing  the  solution. 

We  next  examined  the  number  of  messages  being  sent  and  received  each  day  across  the  different  media 
channels  (Figure  8).  Note  that  the  distributions  of  sent  and  received  visits  and  phone  messages  are  both 
approximately  normal,  reassuring  us  that  the  sample  is  not  significantly  skewed  and  also  suggesting  the 
presence  of  reciprocal  interactions."  As  noted  in  the  far  right  set  of  boxes,  a  developer  typically  received  a 
total  of  16  messages  and  sent  a  total  of  6  messages  diuing  a  working  day.  Ignoring  email  for  the  moment, 
the  most  ubiquitous  form  of  contact  in  this  work  environment  was  in-person  visits.  They  were  used 
approximately  two  to  three  times  as  often  as  the  other  channels. 

One  of  the  most  surprising  results  concerned  the  usage  of  electronic  mail.  Many  corporations  are 


16.  What,  in  general,  is  the  impact  of  having  an  office  mate?  Three  of  our  subjects  shared  an  office  with  a  single  colleague  in  the  same 
work  group,  three  subjects  had  private  offices  and  the  remaining  individual  shared  an  office  with  two  peers.  Prior  studies  have  cited 
an  8-11%  productivity  gain  with  private  offices  [Boehm83],  and  although  we  cannot  assess  such  a  claim  here,  the  presence  of  a 
close  peer  is  definitely  an  important  driver  of  interruptions.  Office  mates  often  debriefed  one  another  about  meetings  or  local 
gossip,  and  there  was  typically  a  large  amount  of  "what  ir  or  "do  you  know  how"  type  of  questions  exchanged  back  and  forth. 

17.  We  do  not  explicitly  track  communication  threads  (a  group  of  related  communication  events  all  devoted  to  the  discussion  of  a 
smgle  problem)  here  [Wolf93]. 
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Figure  8  Number  of  Messages  Being  Sent  and  Received  Across  4  Media  Channels 

The  number  of  messages  being  sent  and  received,  broken  down  by  media  channel  and 
whether  they  were  received  or  initiated  by  the  study  subject.  We  have  applied  a  square 
root  transformation  in  order  to  stabilize  the  variance.  Each  box  contains  data  on  all  7 
study  subjects  across  5  days  of  observation  per  individual. 


starting  to  implement  this  new  form  of  communication,  and  we  fully  expected,  given  the  computer 
intensive  nature  of  this  organization,  to  see  a  large  amount  of  email  traffic.  Although  our  study  subjects 
received  many  such  messages  (a  median  of  9  per  day),  they  sent  very  few  (a  median  of  0  per  day).  What's 
more,  the  content  of  these  email  messages  was  rarely  technical.  Most  of  the  traffic  was  devoted  to 
organizational  news  (announcements  of  upcoming  talks,  recent  product  sales,  or  celebratory  lunches; 
congratulatory  messages  on  a  "job  well  done")  or  process  related  information  (mosdy  announcements  of 
process  changes!) 

We  attribute  this  phenomenon  to  several  factors.  First,  it  is  difficult  and  time  consuming  to  coherently 
draft  a  complex  technical  question  or  response.  As  noted  by  one  developer  "Email  is  too  slow;  by  the  time 
I  type  out  a  coherent  description  of  the  problem,  I  could  have  called  or  walked  over  and  gotten  the  answer." 
Secondly,  the  ambiguity  of  software  technology  necessitate  a  type  of  iterative  problem-solving  that  is  ill- 
suited  to  the  email  venue  [Sack68;  Tsai87].  Our  subjects  may  also  have  been  somewhat  reluctant  to  release 
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a  written  recommendation  or  opinion  without  having  control  over  its  final  distribution.  Rnally,  the  relative 
maturity  of  this  email  system  was  undoubtedly  a  factor.  That  is,  electronic  mail  in  this  development 
organization  appears  to  have  evolved  into  something  of  a  broadcast  medium.  It  is  the  most  efficient  way  for 
the  organization  to  distribute  information  to  the  technical  population,  but  the  very  fact  that  such  messages 
have  flooded  the  system  makes  developers  reluctant  to  use  it  for  pressing  technical  issues. 
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Figure  9  Duration  Per  Contact  By  Media  Channel 

The  duration  per  contact  broken  down  by  media  channel  and  whether  the  message  was 
received  or  initiated  by  the  study  subject.  We  have  applied  a  square  root  transformation  in 
order  to  stabilize  the  variance.  Each  box  contains  data  on  all  7  study  subjects  across  5 
days  of  observation  per  individual. 


Figure  9  plots  the  duration  of  messages  in  each  media  channel.  Looking  across  all  forms  of 
communication,  approximately  68%  of  the  interactions  are  of  less  than  5  minutes  in  duration.  This  agrees 
with  research  done  in  the  early  1980' s  at  Xerox  Pare  [Abel90].  It  also  confirms  anecdotal  evidence 
supplied  by  independent  smdies  of  this  population  [Baum90;  Kelley93].  The  difference  in  the  median 
duration  of  a  sent  versus  received  visit  (approximately  6  versus  3  minutes)  is  attributed  to  travel  time. 
Similarly,  the  fact  that  sent  email  messages  are  of  relatively  longer  duration  than  those  received 
undoubtedly  reflects  compositional  factors.  Not  surprisingly,  audix  messages  are  very  brief  (1  minute),  but 
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phone  messages  are  also  unexpectedly  short  (2-3  minutes).  Finally,  note  the  existence  of  significant  outliers 
m  all  the  media  channels;  visits  of  close  to  one  hour  and  phone  calls  of  30  minutes  are  not  uncommon.  This 
is  a  particularly  non-intuitive  result  in  that  these  are  all  unplanned  and  unanticipated  forms  of  interaction. 
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Figure  10  Total  Time  Devoted  to  a  Media  Channel 

The  number  of  total  minutes  per  day  spend  using  each  media  type.  Again,  we  have 
applied  a  square  root  transformation  in  order  to  stabilize  the  variance.  The  total  time 
spent  in  communication  each  day  is  about  75  minutes.  Of  particular  significance  are  the 
imbalance  in  total  time  spent  receiving  versus  sending  email  and  the  large  variances  in 
time  spent  using  email  and  visiting. 


Finally,  we  have  plotted  the  total  time  spent  communicating  in  Figure  10.  The  median  total  time  spent 
using  all  four  communication  vehicles  is  about  75  minutes  per  day  (50  minutes  received  plus  25  minutes 
sent).  Of  that  total,  about  35  minutes  is  occupied  by  face-to-face  interactions.  Note,  however,  that  the 
variances  associated  with  in-person  visits  and  electronic  mail  are  especially  large.  We  attribute  this  result  to 
the  fact  that  in-person  technical  discussions  often  digress  to  include  social  topics.'*  Similarly,  reading 
email  seems  to  be  a  form  of  communication  that  easily  expands  to  fill  the  time  allocated  to  it;  something  a 
developer  might  do  at  the  end  of  the  day  or  to  kill  time  before  lunch. 
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6.  Conclusions 

The  primary  motivation  behind  this  study  was  to  answer  the  question:  are  time  diaries  a  reliable  way  to 
capture  process  information?  In  so  doing,  we  experimented  with  two  relatively  uncommon  forms  of  data 
collection  in  the  software  development  field:  time  diaries  and  direct  observation.  We  concluded  that  both 
methods  of  research  are  feasible  and  useful,  depending  on  the  questions  one  wishes  to  address.  We 
acknowledge,  however,  that  neither  method  can  satisfactorily  assess  the  subject's  level  of  concentration  or 
engagement  with  the  task.  This  is  an  important  component  which  is  beyond  the  inunediate  scope  of  this 
study. 

In  addition  to  exploring  new  methodology,  however,  we  also  sought  to  investigate  under-developed 
arenas  in  software  research:  the  social  structure,  environment  and  culture  of  a  real  organization  of  software 
developers.  It  is  our  belief  that  all  three  elements  (organization,  process  and  technology)  need  to  be 
addressed  in  order  to  obtain  a  complete  picture  of  the  development  process. 

Indeed,  we  found  that  elements  of  the  organization  were  equally  (if  not  more)  important  than 
technology.  In  particular,  the  data  on  the  number  of  inter-personal  contacts  a  software  developer  needs  to 
make  during  a  typical  working  day  strongly  suggests  that  technical  problems  are  not  the  real  issue  in  this 
organization.  Rather,  these  software  developers  need  to  apply  just  as  much  effort  and  attention  to  determine 
who  to  contact  within  their  organization  in  order  to  get  their  work  done.  Most  importantly,  we  were  able  to 
quantify  what  had  previously  been  predominately  qualitative  impressions  about  life  as  a  software  developer 
in  this  firm. 

7.  Future  Research 

As  noted  by  the  organizational  theorists  March  and  Olsen,  the  empirical  focus  on  time  allocation  is 
"deceptively  mundane"  [March76].  The  prosaicness  lies  in  the  necessity  of  developing  not  only  aggregate 
statistics,  as  we  have  done  here,  but  also  some  explanation  of  the  underlying  process.  In  subsequent 


18.  Although   we   did  not  observe   a  tremendous   amount   of  socializing,   exchanges   often   combined   technical   and   personal 
communication. 
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research  we  plan  to  identify  some  of  the  structural  determinants  of  the  allocation  of  time  within  this 

organization. 

An  additional  imporunt  area  for  further  study  concerns  the  individual  and  organizational  costs,  and 
benefits,  of  these  patterns  of  time  usage  and  communication.  Interruptions  obviously  affect  an  individual's 
level  of  concentration,  yet  they  may  also  serve  important  educational  and  motivational  roles  within  the 
organization. 

As  technology  progresses,  new  forms  of  instrumentation  are  rapidly  becoming  available  to  field 
researchers.  Some  such  advances  are  enabling  researchers  to  capture  information  at  a  very  fine  level  of 
detail,  while  others  are  easing  the  manual  burden  associated  with  empirical  studies.  Yet,  surprisingly,  we 
see  very  little  experimentation  with  these  new  capabilities;  most  field  researchers  still  rely  on  interviews  or 
traditional  paper  surveys.  We  believe  other  approaches,  of  varying  levels  of  technical  sophistication,  are 
called  for,  particularly  in  the  case  of  software  development.  For  example,  we  have  considered  giving 
developers  a  device  similar  to  a  Sharp  Wizard™  and  asking  them  to  record  their  daily  activities  on  it  (a 
more  interactive  type  of  diary).  Or  one  might  prompt  subjects  to  record  their  immediate  activity  by 
randomly  paging  them  throughout  the  day.  (A  combination  of  the  above  two  methods,  targeted  by 
personality,  would  probably  be  most  efficient).  Not  only  will  such  experiments  improve  our  existing 
knowledge  about  software  development,  but  they  also  offer  the  prospect  of  access  to  previously  unexplored 
research  questions. 
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